All data

Regular PCA

Weighted PCA (scone and scde)

Imputed data

There is clearly something wrong with this. I am not sure whether it’s because the weights are not correct or if it’s because the average is not correct (I suspect the latter, given that we are estimating a fixed mean in a heterogeneous population).

Correlation with sample quality

Marker genes

In the original pubblication, they provide a heatmap of the top genes, based on the highest loadings with PC1–3. We should do the same with wPCA and see whether we get better results.

High coverage

Here, we will repeat the analysis for high-coverage only and low-coverage only data.